Data integration

The example below shows how to get a pre-configured repository by name, and to list the resources it contains.


In [1]:
from bdkd import datastore as ds

# This repository is pre-configured
repo = ds.repositories().get('bdkd-laser-public')
# List all resources in the repository
repo.list()


Out[1]:
[u'datasets/Sample dataset']

Resources can be acquired from the repository by name.

The metadata for a resource is available via .metadata. This includes any metadata that was set when the resource was created, as key/value pairs.


In [2]:
dataset = repo.get('datasets/Sample dataset')
dataset.metadata


Out[2]:
{u'created-at': u'2013-06-05',
 u'created-by': u'Dr Josh Toomey',
 u'shard-size': 9}

A resource consists of one or more files. In this example (an experiment) there are many files, including raw data, maps and a README.


In [3]:
len(dataset.files)


Out[3]:
1094

We can narrow down the list of files. In this case, we will narrow down by name: we know the files that contain "FB_" in their name are the raw data files.


In [4]:
raw_files = dataset.files_matching('FB_')
len(raw_files)


Out[4]:
1092

So far we've been manipulating the files in abstract: none of the actual file data has been retrieved from the repository.

This is where magic happens...

The method .local_path() is special: it fetches and caches the file in the local filesystem, returning a path via which the file can be opened. Let's do that for the first raw file for example.


In [6]:
path_to_file = raw_files[0].local_path()
print path_to_file


/var/tmp/bdkd/cache/1000/bdkd-laser-public/files/datasets/Sample dataset/FB_000_INJ_000_09.hdf5

We can then open the file and work with it. Example:


In [7]:
import h5py

h5f = h5py.File(path_to_file)
for ds in h5f:
    inj = h5f[ds].attrs['INJ']
    fb = h5f[ds].attrs['FB']
    print "Feedback: " + str(fb) + " injection: " + str(inj)
h5f.close()


Feedback: 0 injection: 0
Feedback: 0 injection: 1
Feedback: 0 injection: 2
Feedback: 0 injection: 3
Feedback: 0 injection: 4
Feedback: 0 injection: 5
Feedback: 0 injection: 6
Feedback: 0 injection: 7
Feedback: 0 injection: 8
Feedback: 1 injection: 0
Feedback: 1 injection: 1
Feedback: 1 injection: 2
Feedback: 1 injection: 3
Feedback: 1 injection: 4
Feedback: 1 injection: 5
Feedback: 1 injection: 6
Feedback: 1 injection: 7
Feedback: 1 injection: 8
Feedback: 2 injection: 0
Feedback: 2 injection: 1
Feedback: 2 injection: 2
Feedback: 2 injection: 3
Feedback: 2 injection: 4
Feedback: 2 injection: 5
Feedback: 2 injection: 6
Feedback: 2 injection: 7
Feedback: 2 injection: 8
Feedback: 3 injection: 0
Feedback: 3 injection: 1
Feedback: 3 injection: 2
Feedback: 3 injection: 3
Feedback: 3 injection: 4
Feedback: 3 injection: 5
Feedback: 3 injection: 6
Feedback: 3 injection: 7
Feedback: 3 injection: 8
Feedback: 4 injection: 0
Feedback: 4 injection: 1
Feedback: 4 injection: 2
Feedback: 4 injection: 3
Feedback: 4 injection: 4
Feedback: 4 injection: 5
Feedback: 4 injection: 6
Feedback: 4 injection: 7
Feedback: 4 injection: 8
Feedback: 5 injection: 0
Feedback: 5 injection: 1
Feedback: 5 injection: 2
Feedback: 5 injection: 3
Feedback: 5 injection: 4
Feedback: 5 injection: 5
Feedback: 5 injection: 6
Feedback: 5 injection: 7
Feedback: 5 injection: 8
Feedback: 6 injection: 0
Feedback: 6 injection: 1
Feedback: 6 injection: 2
Feedback: 6 injection: 3
Feedback: 6 injection: 4
Feedback: 6 injection: 5
Feedback: 6 injection: 6
Feedback: 6 injection: 7
Feedback: 6 injection: 8
Feedback: 7 injection: 0
Feedback: 7 injection: 1
Feedback: 7 injection: 2
Feedback: 7 injection: 3
Feedback: 7 injection: 4
Feedback: 7 injection: 5
Feedback: 7 injection: 6
Feedback: 7 injection: 7
Feedback: 7 injection: 8
Feedback: 8 injection: 0
Feedback: 8 injection: 1
Feedback: 8 injection: 2
Feedback: 8 injection: 3
Feedback: 8 injection: 4
Feedback: 8 injection: 5
Feedback: 8 injection: 6
Feedback: 8 injection: 7
Feedback: 8 injection: 8

Dataset maps

The dataset should contain a file called "maps.hdf5" that contains all the feedback/injection maps calculated for the dataset.

We cache/aquire this file, open it, then list its contents.


In [3]:
import h5py

maps = h5py.File(dataset.file_ending('maps.hdf5').local_path())
maps.items()


Out[3]:
[(u'AVG_map.csv', <HDF5 dataset "AVG_map.csv": shape (251, 351), type "<f8">),
 (u'FBT_map.csv', <HDF5 dataset "FBT_map.csv": shape (251, 351), type "<f8">),
 (u'INJ_map.csv', <HDF5 dataset "INJ_map.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t01.csv',
  <HDF5 dataset "PE_map_m5t01.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t02.csv',
  <HDF5 dataset "PE_map_m5t02.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t03.csv',
  <HDF5 dataset "PE_map_m5t03.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t04.csv',
  <HDF5 dataset "PE_map_m5t04.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t05.csv',
  <HDF5 dataset "PE_map_m5t05.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t06.csv',
  <HDF5 dataset "PE_map_m5t06.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t07.csv',
  <HDF5 dataset "PE_map_m5t07.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t08.csv',
  <HDF5 dataset "PE_map_m5t08.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t09.csv',
  <HDF5 dataset "PE_map_m5t09.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t10.csv',
  <HDF5 dataset "PE_map_m5t10.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t22.csv',
  <HDF5 dataset "PE_map_m5t22.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t23.csv',
  <HDF5 dataset "PE_map_m5t23.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t24.csv',
  <HDF5 dataset "PE_map_m5t24.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t44.csv',
  <HDF5 dataset "PE_map_m5t44.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t45.csv',
  <HDF5 dataset "PE_map_m5t45.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t46.csv',
  <HDF5 dataset "PE_map_m5t46.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t67.csv',
  <HDF5 dataset "PE_map_m5t67.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t68.csv',
  <HDF5 dataset "PE_map_m5t68.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t89.csv',
  <HDF5 dataset "PE_map_m5t89.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t90.csv',
  <HDF5 dataset "PE_map_m5t90.csv": shape (251, 351), type "<f8">),
 (u'PE_map_m5t91.csv',
  <HDF5 dataset "PE_map_m5t91.csv": shape (251, 351), type "<f8">),
 (u'RMS_map.csv', <HDF5 dataset "RMS_map.csv": shape (251, 351), type "<f8">)]

We are interested in the feedback and injection maps. They are used for plotting.

We will also get a set of permutation entropy calculations -- in this case "PE_map_m5t01.csv".


In [4]:
FBT = maps['FBT_map.csv'][()]
INJ = maps['INJ_map.csv'][()]

pes = maps['PE_map_m5t01.csv'][()]

In [5]:
plt


Out[5]:
<module 'matplotlib.pyplot' from '/usr/lib/pymodules/python2.7/matplotlib/pyplot.pyc'>

In [7]:
%matplotlib inline
import numpy as np

mapX = np.array(INJ)
mapY = np.array(FBT)
mapZ = np.array(pes)
fig = plt.figure(figsize=(4.07, 4.40), dpi=100)
plt.pcolor(mapX, mapY, mapZ)
plt.axes().set_xlim(np.min(mapX), np.max(mapX))
plt.axes().set_ylim(np.min(mapY), np.max(mapY))
plt.colorbar()


Out[7]:
<matplotlib.colorbar.Colorbar instance at 0x7f90d5d74cb0>